Clustering Web Search Results For Effective Arabic Language Browsing
نویسندگان
چکیده
The process of browsing Search Results is one of the major problems with traditional Web search engines for English, European, and any other languages generally, and for Arabic Language particularly. This process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web search results into clusters facilitates users quick browsing through search results. Traditional clustering techniques (data-centric clustering algorithms) are inadequate since they don't generate clusters with highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with different adapted versions for English, European, and Chinese Languages. However, till the day of writing this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search Results Clustering. In this paper, we propose first, to study how STC can be applied for Arabic Language? We then illustrate by example that is impossible to apply STC after Arabic Snippets pre-processing (stem or root extraction) because the Merging process yields many redundant clusters. Secondly, to overcome this problem, we propose to integrate STC in a new scheme taking into a count the Arabic language properties in order to get the web more and more adapted to Arabic users. The proposed approach automatically clusters the web search results into high quality, and high significant clusters labels. The obtained clusters not only are coherent, but also can convey the contents to the users concisely and accurately. Therefore the Arabic users can decide at a glance whether the contents of a cluster are of interest. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising to facilitate Arabic users quick browsing through Search Results. Finally, a recommended platform for Arabic Web Search Results Clustering is established based on Google search engine API.
منابع مشابه
CONTENTS iii Contents List of Figures vi
The increasing amount of data on the Web bears potential for addressing complex information needs more effectively. Instead of keyword search and browsing along links between results, users can specify their needs in terms of complex queries and obtain precise answers right away. However, browsing is also essential on the Web of data as users might not always know a specific query language and ...
متن کاملSearch Result Clustering Using Label Language Model
Search results clustering helps users to browse the search results and locate what they are looking for. In the search result clustering, the label selection which annotates a meaningful phrase for each cluster becomes the most fundamental issue. In this paper, we present a new method of using the language modeling approach over Dmoz for label selection, namely label language model. Experimenta...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملA User-Centered Approach to Evaluating Topic Models
This paper evaluates the automatic creation of personal topic models using two language model-based clustering techniques. The results of these methods are compared with user-defined topic classes of web pages from personal web browsing histories from a 5-week period. The histories and topics were gathered during a naturalistic case study of the online information search and use behavior of two...
متن کاملSupporting non-English Web searching: An experiment on the Spanish business and the Arabic medical intelligence portals
Although non-English-speaking online populations are growing rapidly, support for searching non-English Web content is much weaker than for English content. Prior research has implicitly assumed English to be the primary language used on the Web, but this is not the case for many non-English-speaking regions. This research proposes a language-independent approach that uses meta-searching, stati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1305.2755 شماره
صفحات -
تاریخ انتشار 2013